We are going to use the Scarr-Rowe effect (or hypothesis) to look more closely on interaction effects. The Scarr-Rowe hypothesis states that the (genetic) heritability of a trait depends on the environment, such that the effects of genes are larger when environments are better. The underlying idea is that if everyone lives in a perfect environment, i.e. there is no variation in the relevant environment, then a trait will only depend on genes.
This interaction can be visualized as follows:
par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
heritability = c(scarce = 0.5, plentiful = 0.8)
barplot(heritability,
ylim = c(0,1),
xlab = "environment",
ylab = "heritability (effect of genes)")
Here is a DAG that describes such a model, where
and the Scarr-Rowe effect means the the coefficient of the path \(\small A \rightarrow IQ\) depends on \(\small SES\).
library(dagitty)
dag = dagitty(
"dag{
IQ;
A -> IQ;
SES -> IQ;
}")
coord.list =
list(
x=c(A=0,SES=0,IQ=1),
y=c(A=-.5,SES=.5,IQ=0))
coordinates(dag) = coord.list
drawdag(dag, cex = 1.5)
Note that DAGs do not encode interaction effects by drawing an arrow from the moderator to the relevant path. Instead, there is a path from the moderator to the outcome variable. So the DAG only tells us that IQ is a function of four other variables \(\small IQ = f(A,SES)\), but it does not tell us what the function \(f()\) is.
This function could be
or any other imaginable function that uses the variables. The model we have in mind now is:
\[ IQ = f(SES) A \]
Lets simulate data that show the the Scarr-Rowe effect, first our exogenous variables. To keep things simple, we assume that all variables are normally distributed:
set.seed(3)
N = 250
SES = rnorm(N,mean = 5,1)
A = rnorm(N,mean = 10,2)
SES = rlnorm(N,log(4.75),.3)
A = rlnorm(N,log(10),.2)
Now comes the interesting part: Scarr-Rowe assumes that the effect or weight of genes depends on the SES. So we formulate the weight of genes as as a function of SES. For good measure, we also let the effect of the familial environment vary by SES.
library(boot)
# use inv.logit to give weights a lower and upper bound
# add a constant one to model that even at lowest SES
# there are genetic effects
b_A = function(x) 0.5 + inv.logit(x-5) * 2.5
We are literally defining the weight for A as a function of SES. This is one way to understand interactions. Here is a visualization of the weights, with a histogram of SES values on the bottom.
par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
curve(b_A(x),0,10,
ylab = expression("effect size"~beta[A]),
ylim = c(0,b_A(10)), xlab = "SES")
hist(SES,add = T, probability = T)
In this plot, the interaction is encoded by the fact that \(\small \beta_A\) has a slope.
Now we can simulate IQ values using exogenous variables and derived weights. We assume that IQ depends on \(\small A\), whereby this effect depends on \(\small SES\): \(\small IQ = f(SES)A\). Lets simulate data from this model:
# Adding 70 to get an IQ around 100
IQ = 85 + b_A(SES)*A + rnorm(N,sd = 10)
par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
hist(IQ)
If we just look at the bivariate associations between \(\small A\) or \(\small SES\) and \(\small IQ\), we get the following plot:
par(mfrow = c(1,2),mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(SES,IQ, pch = 16, cex = .5,)
plot(A,IQ, pch = 16, cex = .5,)
To run a simple interaction with a categorical interaction variable
we dichotomise the SES variable to create the variable
SES.c:
low.SES = which(SES < quantile(SES,.50))
high.SES = which(SES > quantile(SES,.50))
idx = c(low.SES,high.SES)
dt = data.frame(
A = A[idx],
IQ = IQ[idx],
SES = SES[idx],
SES.c = c(rep(1,length(low.SES)),
rep(2,length(high.SES))))
Lets plot the association between \(\small A\) and \(\small IQ\) again, this time split by SES:
low.SES = dt$SES.c == 1
high.SES = dt$SES.c == 2
plot_data = function(dt) {
par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(IQ ~ A, data = dt, col = dt$SES.c, pch = 16, cex = .5,
ylim = range(dt$IQ), ylab = "IQ", xlab = "A")
legend("topleft", pch = 16, col = c("black","red"),
legend = c("low SES","high SES"), bty = "n")
}
plot_data(dt)
#abline(lm(IQ ~ A, data = dt[low.SES,]), col = "black")
#abline(lm(IQ ~ A, data = dt[high.SES,]), col = "red")
Next, we estimate some linear regression models with increasing complexity, our goal being to have a model that accurately depicts the effect of A, C, E, and SES on the IQ.
We start with a simple linear regression. But first some intuitions on priors:
a ~ dnorm(100,5)a ~ dnorm(0,4). This prior allows for the possibility of a
negative effect of \(\small A\).dexp(0.25)Here is the regression model:
IQ.A =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + b*(A - 10),
a ~ dnorm(100,5),
b ~ dnorm(0,4),
sigma ~ dexp(.5)
),
data=dt)
Lets quickly look at the prior predictions to make sure the piors are OK.
prior = extract.prior(IQ.A)
A.seq = seq(from=0,to=20,length.out=50)
mu = link(
IQ.A,
post=prior,
data=data.frame(A=A.seq))
par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(
0,type = "n", ylab = "IQ", xlab = "A",
xlim = c(min(A),max(A)),
ylim = c(70,130))
matlines(
A.seq,t(mu[1:100,]),type = "l", lty = 1,
col = adjustcolor("blue", alpha = .5))
Yes, this looks good.
Here are the posterior predictions:
plot_data(dt)
plot_mu.CIs(IQ.A, data.frame(A=A.seq), "blue", spaghetti = TRUE)
This looks generally fine, but we are not capturing the group effects.
To fit a model with a main effect of education, we use the indexing approach:
set.seed(1)
IQ.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a[SES.c] + b*(A - 10),
a[SES.c] ~ dnorm(100,5),
b ~ dnorm(0,4),
sigma ~ dexp(.5)
),
data=dt)
plot_data(dt)
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 2), "red")
We can see already from the plot the the model with separate means for high and low SES is better. But here is the comparison with PSIS-LOO:
compare(
IQ.A,IQ.A_SES,
func = "PSIS") %>%
round(2)
## PSIS SE dPSIS dSE pPSIS weight
## IQ.A_SES 1885.85 22.54 0.00 NA 4.24 1
## IQ.A 1958.74 18.50 72.89 16.64 3.15 0
Lastly, we can also let the slopes vary by SES:
IQ.AxSES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a[SES.c] + b[SES.c]*(A - 10),
a[SES.c] ~ dnorm(100,5),
b[SES.c] ~ dnorm(0,2),
sigma ~ dexp(.5)
),
data=dt)
The key part of this model is that we are not specifying a main effect for A and an interaction effect with SES, but that we are estimating 2 regression coefficients, one for high and one for low SES. This simplifies putting reasonable priors on the effect in each group, but also implies that we do not put a prior on the difference between the two groups.
And again the posterior predictions:
par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot_data(dt)
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 2), "red")
Even though the DGP does not have different “intercepts” for high and log SES, we have to add it, because our model puts the intercept at A = 10, whereas it was A = 0 in the DGP.
Here is a comparison of all the models we have fit:
compare(
IQ.A, IQ.A_SES, IQ.AxSES,
func = "PSIS") %>%
round(2)
## PSIS SE dPSIS dSE pPSIS weight
## IQ.AxSES 1885.08 22.55 0.00 NA 5.20 0.65
## IQ.A_SES 1886.35 22.63 1.28 3.45 4.48 0.35
## IQ.A 1958.88 18.61 73.80 17.20 3.20 0.00
compare(
IQ.A, IQ.A_SES, IQ.AxSES,
func = "WAIC") %>%
round(2)
## WAIC SE dWAIC dSE pWAIC weight
## IQ.AxSES 1885.74 22.55 0.00 NA 5.53 0.55
## IQ.A_SES 1886.11 22.57 0.37 3.49 4.38 0.45
## IQ.A 1958.55 18.53 72.81 17.28 3.01 0.00
While the correct model has the best PSIS and WAIC value, for the top two models the differences are not large compared to the SEs of the differences.
How can we figure out if the difference in slopes is reliably larger than 0? We simply extract posteriors and calculate the difference in slopes from them:
# extract posterior
post = extract.samples(IQ.AxSES)
names(post)
## [1] "sigma" "a" "b"
head(post$b)
## [,1] [,2]
## [1,] 0.1784402 1.2190632
## [2,] 0.1727957 1.6077497
## [3,] 0.6086835 1.4385735
## [4,] 0.8849390 1.7216203
## [5,] 0.3879717 0.6824638
## [6,] 0.6439692 0.7903504
We simply calculate the differences of the two b parameters:
# calculate difference
delta.b = post$b[,2]-post$b[,1]
And now we can show e.g. mean and PIs:
c(mean = mean(delta.b),
PI(delta.b,prob = c(.9))) %>%
round(2)
## mean 5% 95%
## 1.04 0.00 2.07
And we can plot a histogram of the contrast:
par(mar=c(3,3,2,1), mgp=c(1.25,.5,0), tck=-.01)
hist(delta.b, xlim = range(c(0,delta.b)))
abline(v = 0, lty = 2)
abline(v = PI(delta.b,prob = c(.95)), col = "red")
To see if our results are reasonable, lets compare our estimated parameters with the parameters governing the DGP. First the model parameters:
precis(IQ.AxSES, depth = 2) %>% round(2)
## mean sd 5.5% 94.5%
## a[1] 94.71 0.90 93.27 96.16
## a[2] 106.26 0.91 104.81 107.70
## b[1] 0.35 0.45 -0.36 1.06
## b[2] 1.39 0.43 0.70 2.08
## sigma 10.18 0.45 9.47 10.90
Now the parameters from the DGP:
rbind(
mean(b_A(SES[SES<quantile(SES,.30)])),
mean(b_A(SES[SES>quantile(SES,.70)])))
## [,1]
## [1,] 0.9157554
## [2,] 2.4132115
We are not recovering the exact parameters, after all we used a golem instead of a model that depicts the DGP, but qualitatively the results are consistent with the DGP.
Earlier, we described the function
\[ IQ = f(SES) A \]
By saying that the effect \(\small A\) is a function of \(\small SES\). However, we really just have two variables: \(\small A\) and \(\small f(SES)\) which are multiplied with each other. Therefore, these statements are equivalent:
Accordingly, we can also plot the difference between high and low \(\small SES\) as a function of \(\small A\):
A.seq = seq(from=5,to=16,length.out=50)
mu.low = link(IQ.AxSES,data=data.frame(SES.c=1,A=A.seq))
mu.high = link(IQ.AxSES,data=data.frame(SES.c=2,A=A.seq))
delta = mu.high-mu.low
CIs = apply(delta,2,PI)
par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(A.seq, colMeans(delta),'l',
ylim = range(CIs),
col = "blue",
xlab = "A",
ylab = expression(IQ[high~SES]~-~IQ[low~SES]))
shade(CIs,A.seq, col = adjustcolor("blue", alpha = .25))
abline(h = 0, lty = 2)
We are continuing with our simulated Scarr-Rowe data set, but this time we use the full data and formulate a continuous interaction.
dt = data.frame(
A = A,
IQ = IQ,
SES = SES)
You can try it the fancy way and make a 3d plot, but in this instance its a lot of effort for meager results:
library(plot3D)
points3D(A,SES,IQ, type = "h",
col = "black",
cex = .75,
lty = 3,
pch = 16,
phi = 20,
theta = 45,
xlab = "A",
ylab = "SES",
zlab = "IQ")
A panel of 2d plots does the job better, and will later allow to display uncertainty.
qs = quantile(SES, probs = seq(0,1,.25))
par(mfrow = c(1,4), mar=c(2.5,2.5,2,.5), mgp=c(1.5,.5,0), tck=-.01)
for (k in 2:length(qs)) {
idx = which(dt$SES > qs[k-1] & dt$SES < qs[k])
tmp.dt = dt[idx,]
with(tmp.dt,
plot(IQ~A, pch = 16, main = paste0(k-1,". quantile SES"),
ylim = range(dt$IQ), xlim = range(dt$A)))
}
How many panels one uses depends on how one assumes the moderator
influences the effect of interest.
This first model assumes that there are just main effects of \(\small A\) and \(\small SES\). Indeed, this is not an unreasonable assumption if one remembers the raw data, which we show here again:
par(mfrow = c(1,2),mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(SES,IQ, pch = 16, cex = .5,)
plot(A,IQ, pch = 16, cex = .5,)
In preparation of the interaction model we are not centering \(\small SES\), but we are re-scaling it to have a minimum just above 0. If we would center to zero, below zero SES values would predict a negative additive genetic effect.
IQc.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5),
a ~ dnorm(100,10),
ba ~ dnorm(0,4),
bs ~ dnorm(0,4),
sigma ~ dexp(.5)
),
data=dt)
We are using prior predictions to check if the model does a good job:
plot.pred(IQc.A_SES,dt, type = "prior")
This is pretty wild. Lets try a bit narrower priors:
set.seed(1)
IQc.A_SES =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5),
a ~ dnorm(100,7.5),
ba ~ dnorm(0,2),
bs ~ dnorm(0,2),
sigma ~ dexp(.5)
),
data=dt)
plot.pred(IQc.A_SES,dt, type = "prior")
These priors are reasonable, so we look at the predictions:
plot.pred(IQc.A_SES,dt)
As expected from the model we specified, all slopes are the same.
Note that the different “intercepts” we seem to come from this part of
the model: mu <- ... + bs*(SES - 2.5),.
The model is similar to the previous, but adds this interaction term:
bsa*(A - 10)*(SES - 2.5).
IQc.AxSES.m =
quap(
alist(
IQ ~ dnorm(mu,sigma),
mu <- a + ba*(A - 10) + bs*(SES - 2.5) + bsa*(A - 10)*(SES - 2.5),
a ~ dnorm(100,7.5),
ba ~ dnorm(0,2),
bs ~ dnorm(0,2),
bsa ~ dnorm(0,1),
sigma ~ dexp(.5)
),
data=dt)
plot.pred(IQc.AxSES.m,dt, type = "prior")